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Abstract 

Recently data trees and data words have received considerable 
amount of attention in connection with XML reasoning and system 
verification. These are trees or words that, in addition to labels from a 
finite alphabet, carry data values from an infinite alphabet (data). In 
general it is rather hard to obtain logics for data words and trees that 
are sufficiently expressive, but still have reasonable complexity for the 
satisfiability problem. In this paper we extend and study the notion of 
Biichi automata for w-words with data. We prove that the emptiness 
problem for such extension is decidable in elementary complexity. We 
then apply our result to show the decidability of two kinds of logics 
for cj-words with data: the two-variable fragment of first-order logic 
and some extensions of classical linear temporal logic for words with 
data. 



1 Introduction 

The classical theory of automata and formal languages deals primarily with 
languages over finite alphabets. A natural extension of formal languages, 
regular or contex-free, is one that permits the alphabet to be infinite [HO [21 

*We acknowledge the financial support by the European FET-Open Project FoX (grant 
agreement 233599) and the German DFG (grant SCHW 678/4-1). 
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[5l [131 [in |T7j . Most of the extensions, however, lack the usual nice decidabil- 
ity properties of automata over finite alphabets, unless strong restrictions 
are imposed. 

Recently the subject of languages over infinite alphabets received much 
attention due to its connection with XML reasoning and system specifica- 
tion. The most natural model for XML documents is label unranked trees, 
in which each node has a label from a finite alphabet. Thus, standard tech- 
nique in automata theory can be applied [151 HSl [HI- However, real XML 
documents carry data, which usually come from an infinite set, and it is 
essential to reason about those data values. Thus, there is a need to look 
for decidable formalism in the presence of a second, infinite alphabet. 

A similar scenario may happen in system specification where w-words 
(words of infinite length) are used to describe system behaviors. In this 
case a position in the word represents a point in time, while the label of 
the position indicates the atomic propositions that hold at that time. The 
number of atomic propositions is usually only finitely many, and thus, can 
be encoded as finite alphabets. The most common tool for reasoning with 
a;-word is arguably Biichi automata, due to its expressiveness and the low 
complexities for its standard decision problems. For example, it captures 
the so-called monadic second order (MSO) logic, and hence the specification 
languages such as Linear Temporal Logic (LTL) and //-calculus. However, 
the behaviour of many systems includes properties that cannot be captured 
by finite alphabets. A typical example is reasoning about the contents of 
variables, that store values from the infinite domains like the integers or 
strings. Thus, it is also natural to look for some formalisms that allow us to 
reason about a;-words with data values that come from an infinite domain. 

Our focus in this paper is data u-word, that is, w-words in which each 
position also carries a data value from an infinite alphabet. Looking at the 
literature [21 [3l [1 IS H HOl [131 [Illlll [IT] one can immediately notice that 
decidable formalisms for data w-words are hard to obtain, unless strong re- 
strictions are imposed. Nevertheless, some significant progress have been 
made recently [3l UHl IS] . A deep result in [3] shows that the restriction of 
first-order logic to its two variable fragment, FO^, remains decidable over 
data Lj-words. The pioneering works in Linear Temporal Logic for tj-words 
with data are the papers [101 [9]. In [9] an extension of Linear Temporal 
Logic (LTL) to handle data values is proposed and its satisfiability problem 
is shown to be decidable. In papers [3, 9j the satisfiability problem, even 
though is decidable, has unknown upper bound complexity. The decidability 
is obtained by reducing the satisfiability problem to the reachability prob- 
lem in Petri nets, the precise complexity of which has been open for many 
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years, though it is known to be in EXPSpace-hard. In the paper [lOj the logic 
is decidable, but not primitive recursive, for finite data words, while it be- 
comes undecidable for w-words. The paper [9] also contains a logic which is 
decidable in PSpace. However, the logic has quite limited expressive power, 
in which the finite alphabet for the labels consists of only one single symbol. 

In this paper we propose and study an extension of Biichi automata with 
a formalism to specify constraints on data values. Roughly those constraints 
are database theory inspired, called key-, inclusion- and c/eniaZ-constraints. 
A key-constraint states that no two positions labeled with the same symbol 
a has the same data value; inclusion-constraint states that every data value 
found in a position with label a is found in a position with label b; while 
denial-constraint states that the sets of data values found in positions with 
labels a and b are disjoint. Those constraints are very common in database 
theory. We show that the emptiness problem for such extension is decidable 
in NEXPTime, whereas if there is no key-constraint, then the complexity 
drops to NP. We then apply our results to show the decidability of two kinds 
of logics for data w-words: the two-variable fragment of first-order logic and 
some extensions of classical linear temporal logic for data (x>-words. Both 
have elementary complexity. 

The vocabulary for the two- variable logic that we consider here has only 
the successor relation on the positions in the w-word and the data equality, 
in addition to the finite number of unary predicates for the finite labeling. 
In [3] the vocabulary includes the order on the positions in the a;-words 
and as mentioned earlier, the satisfiability problem for the two-varible logic 
becomes at least as hard as the reachability problem for Petri nets. 

Another work that is related to our work is the remarkable result in [2], 
which shows that for two-variable fragment of first-order logic over finite 
unranked data trees, with vocabulary consists of successor and data equality, 
is decidable in 3-NEXPTime. Another proof with different approach for the 
restricted case of finite data words was later obtained in [8J . 

The paper is organized as follows. In Section [2] we define the notations 
and tools that we are going to use in this paper. In Section [3] we introduce 
the extension of Biichi automata by equipping it with data-constraints and 
we prove that the emptiness problem is decidable in elementary complexity. 
We call this model Biichi automata with data- constraints (ADC). In Sec- 
tion m we further extend ADC with operators for comparing the equality 
between neighboring data values, which we call profile Biichi automata with 
data- constraints. The emptiness problem for this model is also decidable 
in elementary complexity. Then in Section [5] we present a decision proce- 
dure for the satisfiability problem of the two- variable fragment of first-order 
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logic. Finally in Section [6] we introduce a version of Linear Temporal Logic 
(LTL) that is equipped with some operators for data value comparisons. For 
this also we prove that the satisfiability problem is decidable in elementary 
complexity. 

Acknowledgement We thank Claire David, Leonid Libkin and Thomas 
Schwentick for fruitful discussions. 

2 Notations 

2.1 Data words 

Let S be a finite alphabet and D an infinite set of data values. A finite word 
is an element of S*, while an w-word is an element of T,^ . A finite data word 
is an element of (S x D)* , while a data oj-word is an element of (S x S)'^. 

We write a data (finite or w-) word w as (^^) (^^) • • • , where ai, 02, ■ ■ ■ € S 
and di,d2, . . . € The symbol Oj is the label of position i, while the value 
di is the data value in position i. The projection of w to the alphabet S is 
denoted by Proj(t(;) = 0102 • • • . A position in w is called an a-position, if the 
label is a. We denote by Vw{a), the set of data values found in a-positions 
in w, i.e., Vu,{a) = {di \ ai = a}, for each a € S. Note that some T4;(a)'s 
may be infinite, while some others finite. 

2.2 Data-constraints: constraints on the data values 

There are three kinds of data-constraints over the alphabet S: 

1. key-constraints, written in the form: V{a) 1— )• a, where a G S. 

2. inclusion- constraints, written in the form: V{a) C IJfeG-R^(^)' '^^i^re 
a G S, C S. 

3. denial- constraints, written in the form: V{a) n V{b) = 0, where a,b € 

Whether a data word w satisfies a data-constraint C, written as w \= C, is 
defined as follows. 

1. w \= V{a) I— 7- a, if every two a-positions in w have different data values. 

2. V{a) C if VM ^ UbeR^Ub)- 

3. V{a) n V{b) = 0, if V^{a) n V^{b) = 0. 
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If C is a collection of data-constraints, then we write w \= C, if w \= C ioi 
all C eC. 

2.3 Transition systems and Biichi automata 

A transition system over the alphabet S is a tuple M = {Q,fi), where Q is 
a finite set of states and //CQxSxQis the set of transitions. 

A Biichi automaton A over the alphabet S is simply a transition system 
A4 with a designated initial state qq and a set F C Q of final states. In such 
case, we write A = and the system M. is called the transition system 
of A. 

A run of A on an a;- word w = aia2 - ■ ■ is a sequence p = p\P2 • • • of states 
in Q such that (goj^iiPi) ^ and (pj, aj4-i,pj+i) G /U, for each i = 1,2,.... 
Note that we exclude the initial state in the run p. This is done for our 
convenience of indexing. 

Let lnf(/9) denote the set of states that appear infinitely many times in 
p. The run p is accepting, if Inf(p) fl F / 0. An a;-word w G C{A), if 
there exists an accepting run of ^ on it;. As usual, C^A) denotes the set of 
w-words accepted by the automaton A. 

2.4 Presburger automata 
Existential Presburger formula 

Atomic Presburger formulae are of the form: X1 + X2 + ■ ■ ■ Xn < yi + - ■ ■ + ymj 

OT xi-\ Xn < K, OT xi-\ x„ > if, for some constant K G N. Existential 

Presburger formulae are Presburger formulae of the form 3x (f), where ^ is a 
Boolean combination of atomic Presburger formulae. 

We will be using Presburger formulae defining Parikh images of words. 
Let S = {oi, . . . ,ak} be a finite alphabet, and let u G S* be a finite word. 
We denote by H^vidi) the number of occurrences of in v. By Parikh('t;) we 
mean the Parikh image of v, i.e., {^v{ai), . . . , #t,(afc)). 

With alphabet letters ai, . . . , a^, we associate variables Xa^ , ■ ■ ■ , Xa^. ■ A 
Presburger formula cp with free variables Xa^ , ■ ■ ■ , is said to be a formula 
over the alphabet {ai,...,afc}. A word i; G S* satisfies it, written as v |= 

an ■ ■ ■ j^ofc) if and only if (/3(Parikh(t!)) holds. 

Presburger automata 

A Presburger automaton is a pair {Af,„,ip), where .Afin is a finite state au- 
tomaton for finite words and (^(xai, • . . ,Xa^) is an existential Presburger 
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formula over the alphabet S. A word w is accepted by {Af,„,ip), denoted by 
C{A,i„,ip), if w £ C{Af,„) and (/?(Parikh(w)) holds. 

Note that as convention, we will use the symbol Afi^ for finite state 
automata that works over finite words. We reserve the symbol A for Biichi 
automata, which works over cj-words. 

As in [21 [8] , the following result is the basis for all the decidability results 
in this paper. 

Theorem 1 ^ 191 The emptiness problem for preshurger automata is decid- 
ahle in NP. 

3 Automata with data-constraints 

In this section we extend the definition of Biichi automata with data-constraints 
over the input alphabet S. We then provide a decision procedure for its 
emptiness problem, from which all other decision procedures in this paper 
are extended. 

Definition 2 An Automaton with Data- constraints^ or in short ADC, is a 
pair (^, C), where ^ is a Biichi automaton and C is a collection of data- 
constraints over the alphabet S. 

Let w = (^j) (^^) • • • be a data (^-word. The ADC [A^C) accepts the data 
Lj-word if Proj(ii;) G A and w \=C. We denote by C{A^C) the language 
that consists of all the data w- words accepted by the ADC {A^C). 
We consider the following problem. 

Problem: Omega-SAT- ADC 

Input: An automaton with data-constraints {A^ C) 

Question: Is there an data w-word w € C{A,Cy. 



Theorem 3 The problem SAT-ADC is decidahle in NEXPTime. Moreover, 
if the collection C of data constraints does not contain key- constraints, then 
it is decidahle in NP. 

For the proof we first introduce some essential notations in Subsec- 
tion [3Tl then we outline the NEXPTime algorithm in Subsection 13.21 The 
NP algorithm can be found in Appendix [Cl 

Before we start the first proof in this paper, we want to remark the 
similarities and differences between the technique in this paper and the 
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one in [3j. The only similarity is that all techniques rely quite heavily on 
Presburger counting. However, there is a different emphasis in the counting 
process: in [2] the technique is to count the number of the so called dog 
labels and sheep labels (see pp. 35-36 in [2j), where intuitively, the dog 
labels are used to represent the data values. In this paper the technique 
involves counting directly the "number" of data values. 

3.1 Some notations for the proof of Theorem [3] 

For a data w-word w and a non-empty subset S C S, we denote by 

where Vw{h) denotes the complement of Vw{h)., i.e. D — Vu,{h). It must 
be noted that the sets [S'Jw's are disjoint, and for each a G S, Vw{a) is 
partitioned into Vw{a) = Uagsl'^lw These two properties (disjointness and 
partition) of [5']^'s are very crucial in our decision procedure. 

According to the cardinalities of [Sj^'s, we divide the non-empty subsets 
S" C S into three classes: 

. S^iw) = {S I [S]^ = 0}. 

• Sf,„{w) = {S I [S\uj is a finite non-empty set}. 
. SUw) = {S I [S\ ^ is an infinite set}. 

Proposition 4 Proposition 1] For every data u-word w, the following 
holds. 

1. w \= V{a) C Ubgij^C^) ^/ ^''^d only if S & So{w), for all S such that 
ae S, hutSr\R = %. 

2. w \= V{a) n V{h) = % if and only if S ^ Sq{w), for all S such that 
a,b & S. 

Proof. (2) is immediate from the definition of [S]w, while (1) follows from 
the fact that 

Vwia) C y K,(6) if and only if V^a) n f] Vjh) = 0. 

b£R beR 

a 
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3.2 The algorithm 



Let {A,C) be the given ADC and ^A = {Q,n) be the transition system, 
where A = A^^- Roughly our algorithm to determine whether C{A,C) = 
is as follows. 

1. Guess a partition 5o,5fm,5oo of the sets 2^ — {0} that respects the 
following conditions. 

(CI) If the inclusion-constraint V{a) C IJ^g^ ^{b) is in C, then all the 
sets S, where a £ S and S H R = are in Sq. 

(C2) If the denial-constraint V{a) fl V{b) = is in C, then all the sets 
S, which contains both a and b, are in Sq. 

The intended meaning of the guesses <So, 5fin, <Soo are the sets Sq^w), 
Sf,„{w) Sooiw), respectively, for some w G C{A,C). 
Moreover, Conditions (CI) and (C2) must be respected due to Propo- 
sition m 

2. Construct the following two items, of which the details are provided 
below. 

(a) A new alphabet S, which depend on the original alphabet S and 
the sets in Soo 

(6) A transition system A4 = {Q, jl) over the alphabet E, which 
depends on the original transition system A4 and the sets in S^o- 

3. Non-deterministically choose one state q (z Q and construct the fol- 
lowing two items. 

• a Presburger automaton {Af,„,ip), where Af,„ = -M^l^^ and the for- 
mula ip depends on the partition So,Sf,„,Soo and the constraints 
in C; 

• a Biichi automaton A, which depends on the constraints in C, the 
new transition system A4, and the sets in 5oo- 

4. Test the emptiness of C{Af,„,ip) and C{A). 

Then, £{A,C) / if and only if + and L{A) + 0. 

In the paragraphs below we will outline the details of Steps (2) and (3). The 
analysis of the complexity is given in Appendix 1X1 

The proof of the correctness will follow from our claim that L(^A^C) ^ 
if and only if there exist some "correct" guesses for <So,5fin,5oo in Step (1) 
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and the state q G Q in Step (3) such that C{Af,„,ip) / and C{A) / 0. 
The details of the proof of the correctness will be given in Appendix |Bl The 
main idea of the proof is that from a word u £ C{Afi„, f), we can construct 
a finite data word w, and from an omega word v € C{A), we can construct 
a data w-word w' such that ww' € C{A,C). 

Constructing the alphabet E and the transition system A4 

We define a set S(5oo) = {{a,S) | a € 5 and S € Soo}- Then, the new 
alphabet S is S = S U S(5oo)- The transition system A4 = {Q, fl) is defined 
as Q = Q and fl = fiU {{p, (a, S),q) \ {p, a,q) £ ^ and (a, S) € S(5oo)}- 

Constructing the Presburger automaton {Afi„,ip) 

Let g € Q be the state chosen non-deterministically in Step (3). The au- 
tomaton A,,„ is simply M^o^ ■ The Presburger formula (p is defined as fol- 
lows. Let Si, ... , Sm be the enumeration of non-empty subsets of S, where 
m = 2l^l - 1. 

The formula if is of the form 3zsi ' ' ' ^-^Sm V') where ifj is the following 
quantifier-free formula: 

f\Xa>Y,^S A f\ zs = 

A 

/\ ZS>1 A /\ Xa = ^Zs 

SG<Sfin V{a)^aeC a€S 

Note the constructed formula (p does not involve the symbols in S(5oo)- 
Constructing the Biichi automaton A 

~ p 

The Biichi automaton A is simply the intersection of with the automa- 
ton that checks the following conditions. 

1. Each (a, S) € S(5oo) appears infinitely many times. 

2. If the key-constraint V{a) i-)- a € C, then the symbol a does not appear. 
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4 Automata with data-constraints and profiles 



Given a data word w 
Prof'\\e{w), is the word 



idOidO profile word of w, denoted by 



ProfileH = (ai,(Li,i?i)),(a2,(L2,ii2)),... e (S x {*,T,±} x {*,T,±}) 



such that for each position i = 1,2,..., the values of Li and i?j are either 
T, or _L, or *. If Lj = T and z > 1, it means that the position on the left, 
i — 1, has the same data value as position i; otherwise Lj = _L. If i = 1 (i.e., 
there is no position on the left), then Lj = *. The meaning of the Ri^s is 
similar with respect to positions on the right of i. 

A profile Biichi automaton ^ is a Biichi automaton over the alphabet 
Sx{*,T,±}x{*,T,±}. It defines a set Cdata{^^) of data words as follows: 
w € C(iata{^} if and only if A accepts Profile(t(;) in the standard sense. 

A profile Biichi automaton with data-constraints is a tuple {A,C), where 
^ is a profile Biichi automaton and C is a collection of data-constraints. It 
defines a set of data cj-words as follows. An data cj-word w is accepted by 
{A,C) if Profile(w) e C{A) and w^C. 

Theorem 5 The emptiness problem for profile Biichi automata with data- 
constraints is in 2-NEXPTime. 

We give a sketch of the proof in Subsection 14.21 The details can be 
found in Appendix [Dj Before that we give a slight extension of profile Biichi 
automata with data-constraints, which we call profile Biichi automata with 
data- constraints on the state alphabet. It is a trivial extension, but it will be 
very useful for our presentation in Appendix [Fl 

4.1 Profile Biichi automata with data-constraints on the state 
alphabet 

Definition 6 A profile Biichi automaton with data- constraints on the state 
alphabet is a pair {A,C), where 

• A = {Q, qo, ^, F) is a profile Biichi automaton, and 

• C is a collection of data-constraints over the state alphabet Q (instead 
of over the input alphabet E as in Definition [2]) . 

Let w = (^|) (^^) • • • be an data w-word, and p = piP2 • • • be a run of A 
on Profile(?i;). The induced data word of w on p is the data word p{w) = 




10 



The automaton (A, C) accepts the data w-word w, if there is an accepting 
run p of ^ on Proj(u;) such that p{'w) \= C. We denote by C{A,C) the 
language that consists of all the data w-words accepted by the automaton 



The upper bound in Theorem [3] still holds for the emptiness problem 
of this type of automaton. Indeed given an input {A,C), a profile Biichi 
automaton with data-constraints on state alphabet Q, we can reduce it to 
{A',C'), a profile Biichi automaton with data-constraints over the alphabet 
Q X S as follows. The automaton A' accepts the a;-word (with profiles) 
{qi, oi, profilex)((?2) 0-2, prof i 162) • • • if and only if ^1^2 • • • is an accepting run of 
the automaton A on (ai, profilex)(a2, profile2) • • • . The automaton A' simply 
checks whether (q^, (ai, profile^), Qi+i) is a valid transition in A. Furthermore, 
the data-constraints over the alphabet Q can be reduced to data-constraints 
over the alphabet (Q x S) as follows. 

1. The key-constraint V{q) q can be reduced to V{q,a) 1— > {q,a), for 
each a € S and denial-constraints V{q,a) D V{q,b), whenever a ^ b 
and o, 6 € S. 

2. The inclusion-constraint V{q) C IJp^^ Vip) can be reduced to inclusion- 
constraints V{q, a) C Upg^ ^(Pi b), for each o G S. 

3. The denial-constraint V{q) n V{p) = can be reduced to denial- 
constraints V{q, a) n V{p, b) = 0, for each a, 5 G S. 

4.2 Sketch of proof of Theorem [5] 

The proof is an extension of the one in the previous section. However, we 
need a bit more auxiliary terms. Let w = (^j) (^j) ' ' ' be an data Li;-word 
over S. A zone is a maximal interval [i, j] with the same data values, i.e. 
di = cij+i = • • • = dj and di-i 7^ di (if i > 1) and dj ^ dj+i (if j < n). The 
zone [i, j] is called an S-zone, if S is the set of labels occuring in the zone. 

The zonal partition of w is a sequence (fci, /c2, . . .), where 1 < fci < ^2 < 
• • • such that [1, fci], [ki + 1, A;2], . . . are the zones in w. Let the zone [1, ki] 
be an 5i-zone, [ki + 1,A;2] an S'2-zone, [k2 + l.-fca] an S'a-zone, and so on. 
The zonal word of w is a data word over S U 2^ defined as follows. 



That is, the zonal word of a data word is a word in which each zone is 
preceded by a label 5 G 2^, if the zone is an S-zone. 



{AC). 
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Moreover, it is sufficient to assume that only the positions labeled with 
symbols from 2^ carry data values, i.e., the data values of their respective 
zones. Obviously each two consecutive zones have different data values, 
thus, two consecutive positions (in Zona\{w)) labeled with symbols from 2^ 
also have different data values. 

Furthermore, if w is a data w-word over S, then for each a € S, = 
Uae5 ^zonai{«))('S')- Proposition [7] below shows that data-constraints for data 
words over the alphabet S can be converted into data-constraints for the 
zonal data words over the alphabet S U 2^. 

Proposition 7 For every data word w over S, the following holds. 

• A data oj-word w satisfies a key- constraint V{a) ^ a if and only if its 
zonal data word Zonal(w) satisfies the following constraints. 

Kl. The key- constraints V{R) ^ R, for each R such that a & R. 

K2. The denial- constraints V{R) H V{R') 7^ 0, for each R,R' such 
that a€ R,R' and R + R' . 

K3. The symbol a occurs at most once in every zone in Zonal(w). 

(By a zone in Zonal(t<;), we mean a maximal interval in which 
every positions are labeled with symbols from Ti.) 

• A data oj-word w satisfies an inclusion- constraint V{a) Ubes ^(^) 
and only if its zonal data word Zonal(i(;) satisfies the following inclusion- 
constraints: 

V{R)Q U V{S') 

for each R such that a ^ R. 

• A data uo-word w satisfies a denial- constraint V{a) n V{b) = % if 
and only if its zonal data word Zonal(w) satisfies the following denial- 
constraints: 

V{R) n V{R') 
for each R and R' such that a G R and b G R' . 

Proof. The proof is straightforward due to the fact that 

^wia) = [J Fz„„3i(^)(5). 

a£S 

□ 
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Now, given a profile automaton A over the alphabet S, we can construct 
in exponential time an automaton such that for all data a;-word w, 

Profile(?i;) G C{A) if and only if Proj(Zonal(?i;)) G 

Such an automaton A^™'^^ is called a zonal automaton of A. Moreover, if the 
key-constraint V{a) i— > a G C, we can impose the condition K3 in Propo- 
sition [7] inside the automaton A^"^''^. This, together with Proposition [71 
implies that the emptiness problem of profile Biichi automata with data- 
constraints can be reduced to an instance of the following problem. 



Problem: 


Omega-SAT-zonal- AUTOMATA 


Input: 


• a zonal automaton A^™'"'' 




• a collection of data-constraints over the alphabet 2^ 


Question: 


is there a zonal word w such that 




• Proj(u;) G and w ^ C™~"^ and 




• in which two consecutive positions labeled with 




symbols from 2^ have different data values? 



The algorithm in Subsection 13.21 can be adapted to solve omega-SAT- 
ZONAL-AUTOMATA. Extra cares are needed for the following two issues: (1) 
that each two consecutive zones must be assigned different data values, and 
(2) the possibility that the given zonal automaton accepts only tj-words with 
finitely many zones. We refer the reader to Appendix[D] for the details. 



5 Two- variable logic for data oj-words 

For the purpose of logical definability, we view data w-words as structures 

W = (N,+l,{a(.)}aGS,~), (1) 

where N is the natural numbers {1, 2, . . .} which indicates the positions, +1 
is the successor relation (i.e., iff i + 1 = j), the a(-)'s are the labeling 

predicates, and « ~ j holds iff positions i and j have the same data value. 

We let FO stand for first-order logic, MSO for monadic second-order 
logic (which extends FO with quantification over sets of positions), and 
3MS0 for existential monadic second order logic, i.e., sentences of the form 
3Xi . . . 3Xm ip, where ■0 is an FO formula over the vocabulary extended 
with the unary predicates Xi, . . . ,Xm- We let FO^ stand for FO with two 
variables, i.e., the set of FO formulae that only use two variables x and 
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y. The set of all sentences of the form 3Xi . . . V') where if) is an FO^ 
formula is denoted by 3MS0^. 

To emphasize that we are talking about a logic over data words we write 
(+1,~) after the logic: e.g., F02(+l,~) and 3MS02(+1,~). Note that 
3MS0^(+1) is equivalent in expressive power to MSO over the usual (not 
data) finite words, i.e., it defines precisely the regular languages pO] . 

It was shown in [3] that 3MS02(+1, <, ~) is decidable over data words. 
In terms of complexity, the satisfiability of this logic is shown to be at 
least as hard as reachability in Petri nets. Without the +1 relation, the 
complexity drops to NEXPTime-complete; however, without +1 the logic is 
not sufficiently expressive to capture regular relations on the data-free part 
of the finite word. 

In this section we will prove the following: 

Theorem 8 The satisfiability problem is decidable for 3/WS0^(+l,~) over 
data uj-words. Moreover, the complexity of the decision procedure is elemen- 
tary. 

5.1 A normal form for 3MS0^(+1, ~) 

Decidability proofs for two-variable logics typically follow this pattern: a 
syntactic normal form is established; to be followed by a combinatorial proof, 
where decidability is proved for that normal form (by establishing the finite- 
model property, or by automata techniques, for example). 

Our proof is not different that it starts by establishing a normal form for 
F0^(+1,~), and then prove the decidability for the normal form. In fact, 
our normal form follows closely the one given in ^ for unranked finite data 
trees. It can simply be adapted it to the case of w-words. It easily follows 
from [2j that every 3MS02(+1,~) sentence over data w- words is equivalent 
to a sentence 

3Xi...3X,,(xA/\0i a/\^,) 

« 3 

where 

1. X is an F0^(+1) sentence over the extended alphabet S x {*, T, _L} x 
{*,T,_L} (and it can be converted to a profile Biichi automaton in 
elementary complexity); 

2. each (pi is of the form \lx\/y{a{x) A a{y) Ax^y^x = y), where a is 
a conjunction of labeling predicates, X^'s, and their negations; and 
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3. each tpj is of the form Vx3y a{x) — > (a; ~ y A a'{y)), with a, a' as in 
item 2. 

The number of the unary predicates X's is single exponential in the size of 
the original input sentence. 

If we extend the alphabet to S x 2^^ so that each label also specifies the 
family of the Xj's the node belongs to, then sentences in items 2 and 3 can 
be encoded by data-constraints: formulae in item 2 become key- and denial- 
constraints, and formulae in item 3 become inclusion-constraints. Sentence 
(1) simply becomes an FO^(-l-l) sentence over the alphabet S x 2*^. 

Indeed, consider, for example, the sentence 'ixiy{a{x) A a{y) A x ~ 
y ^ X = y). Let S' be the set of all symbols (a, 6) G S x 2*^ consistent 
with a. That is, a is the labeling symbol used in a (if a uses one) or an 
arbitrary letter (if a does not use a labeling predicate), and the Boolean 
vector h has 1 in positions of the Xj's used positively in a and in positions 
of Xj's used negatively in a. Then the original sentence is equivalent to 
the key-constraints: V{a) i— )■ a, for each a € E' and denial-constraints: 
V{a) n V{b) = 0, for every a, 5 € S' and a ^ h. The transformation of item 
3 sentences into inclusion-constraints is the same. 

Hence, the satisfiability problem of 3MS0^(-|-1, ~) can be reduced to the 
emptiness problem of profile Biichi automata with data-constraints, whose 
elementary complexity has been established in the previous section. 

6 LTL that handles data values 

In this section we extend the standard LTL with the operators O"', O'^, 
X^, to handle comparison between data values, which we denoted by 
LTL[0'",O^X^,X^]. 

Let S be a finite alphabet. Formally, the logic LTLfO*", X^, X^] is 
defined as follows. 

• Both True and False are LTLfO*", O'^, X^, X^] formulae. 

• For each a G S, a is a LTLfO*", X^, X^] formula. 

• li ip and ■)/' are LTL[0"', O'', X^, X^] formulae, then so are 

; ip\/ i/j ; if Ai/j ; X ip ; ipl] ip ; ipRtp 

• If is a LTLfO*", O*, X^, X^] formula, then so are 

O"" if ; O' if ; X^ 9? ; ip 
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The operators X,U, R stand for neXt, Until and Release, respectively. We 
write F(/? as abbreviation for TrueU(/? and G(p for -^F{^ip). The operators 
O'^if, O^ip are to check the existence of a data value in the position where 
the formula f holds. 

We will not give the formal semantics of LTL[0'^, O'^, X^, X^] here, which 
can be found in Appendix [El Instead we give only the intuitive meanings of 
the operators O"^, O**, X^ and X^, which are as follows. 

• The formula X^ holds in position i, if it has the same data value as 
the next position i + 1. 

• The formula X^ holds in position i, if it has different data value as the 
next position i + 1. 

• The formula O^ip holds in position i, if there exists a position j that 
has the same data value as position i and in which the formula (p holds. 

• The formula 0*99 holds in position i, if there exists a position j ^ i 
that has the same data value as position i and in which the formula ip 
holds. 

For an data w-word w and a formula (p € LTL[0"', O*, X^, X^], we write 
w,i \= ip to denote that in position i the formula cp holds. As usual, for a 
formula cp E LTL[0"', O*, X^, X^], we denote by C{ip) the set of words w for 
which w,l \= ^p- 

Notice the subtle difference between O"' and O**, with w stands for 
"weak" and s for "strong," respectively. With O"^ it is not necessary that 
the position j is different from the current position, while with the posi- 
tion j must be different. Obviously, O"' is weaker than O*, as O^ip can be 
expressed as V 0*99, hence the name "weak" and "strong." In fact there 
exists a language expressible in LTL[C>*], but not in LTL[0"']. 

At the first glance, it may appear that O"' is too weak to capture any 
interesting property. But as we will see later that the satisfiability problem 
even for LTL[0'^] is already NEXPTime-complete. 

We will denote by LTL[0"'] and LTL[0*] the class of formulae that uses 
only and O*, respectively, but do not use the operators X^ and X^. We 
give some examples which will be used in the later sections. 

Example 1 Consider the language -Z>fcey(a) which consists of data words in 
which every two positions labeled with a have different data values. Lf.f,y(^a) 
is expressible by the LTL[0'*] formula G (a — ?• -lO^a). On the other hand. 
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the formula G (a — t- -■O'^a) does not make much sense as essentially it only 
expresses the data words in which the symbol a does not appear. 

Example 2 Consider the formula if := G(a — > O^a) over the alphabet S. 
Then, w G if and only if every data value in Vw{a) appears at least 

twice (among a-positions). This language C{(p) cannot be captured by an 
ADC. 

Now consider a slightly different representation of the formula Let S be a 
copy of the alphabet S, in which 6 G S denotes the corresponding symbol of 
6 G S. Consider the following formula if' := G{a O^a) over the alphabet 
S U S. Essentially (p and if' are equivalent up to renaming a back to a. 
However, C{(p') can be captured by an ADC. This simple trick will be useful 
in our translation of LTL[<C>''] to an ADC for the purpose of deciding the 
satisfiability problem for LTL[0*]. 

Theorem 9 

1. The satisfiability problem for LTL[<>^] is NEXPTime-complete. 

2. The satisfiability problem for LTLlO"] is 2-NEXPTime. 

3. The satisfiability problem for LTL[0'' , X^, X^] is 3-NEXPTime. 

The proofs for the upper bounds in Theorem [9] can be found in Ap- 
pendix[Fj The proof for the hardness part in (1) can be found in AppendixiGl 
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A Analysis of the time complexity of the Algo- 
rithm in Subsection 13.21 



Obviously Step (1) takes exponential time in the size of the alphabet S. 
Moreover, the sizes of the automaton Af\„, the formula ip and the Biichi 
automaton A are all exponential in the size of the original alphabet S. The 
emptiness of Biichi automaton A can be checked in polynomial time, while 
the Presburger automaton {Afi„,C) can be checked in NP. So overall our 
algorithm works in NEXPTime. 

B Proof of the correctness of the algorithm in 
Subsection 13.21 

Throughout this section we fix an ADC {A, C) and M = {Q, f^) the transition 
system of A, where A = -A4^- We will demonstrate the following two claims, 
of which proofs are provided into the subsequent two subsections. 

Claim 1 Suppose there exists an data co-word w G C{A,C). Then, by fixing 
Sq = So{w), Sfir, = Sfi„{'w) and Sqo = <Soo{w), the constructed Presburger 
automaton {Afi„, if) and Biichi automaton A are both not empty. 

Claim 2 Suppose there exist a partition Sq, S fj„, S of the set 2^ - {0} such 
that the constructed Presburger automatn {Afin, 92) and Biichi automaton A 
are both not empty. Then, there exists an data uj-word w G ^{A, C) such 
that So{w) = So, Sfi„{w) = Sfi„ and Sooiw) = Soo- 

We write w[< i] to denote the initial segment of w of length i, while 
w[> i] the w-word obtained by discarding the initial segment of length i — 1 
from w. Then, Proj(?x;[< i]) = ai ■ ■ ■ ai, and Proj(t(;[> i]) = ajaj+i • • • . 

B.l Proof of Claim m 

Let w be an data w-word accepted by {A,C). Let Sq = So{w), Sf,„ = Sf,„{w), 
and 5oo = Soo{w). Let N be the minimal index such that for each S £ Sf,„, 

[S]w[<N] = [S]w 

Let p = P1P2 • • • be the accepting run of A on Proj(zi;). Let S and Ai 
be the new alphabet and the transition system constructed in Step (2) of 
our algorithm. Then, we pick the state pjy for the state q, supposedly be 
non-deterministically picked in Step (3) of our algorithm. The Presburger 
automaton {Afi„,(p) constructed in Step (3) has the final state pN, while the 
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Biichi automaton A has the initial state pN- That is, A,i„ = M^^^^ and 
Consider the (without data) w-word xiX2 ■ ■ ■ over the alphabet S, where 



Xi 



ai if G [5]^ and S ^ S. 

{ai,S) e Eg if di e [S]u) and 5 G 5, 



oo 



We claim that the following words: 

• vi= XiX2 • • • xjv e /:(An, fys). 

• V2 = XN+lXN+2 • • • e C{A). 

B.1.1 Proof of vi = xiX2 • • • XN £ C(Afi„, 
There are two things to show here: 

1. That vi is accepted by Am- 

2. That (^(Parikh(xi • • • a;Ar)) holds. 

It is pretty straightforward to verify that pi ■ ■ ■ is a, run of ,4 on Xi • • • Xpf. 
That it is an accepting run follows from the fact that q^jv is a final state in 

An- 

Now we will show that (^(Parikh(gi • • • qn)) holds. Recall that the formula 
(p is of the form: 

3^51 • • • 3zs^ Vi A -02 A -03 A -04 

where 

• the formula ipi is the conjunction 



/\xa>^zs 



• the formula ip2 is the conjunction 

A ^5 = 

seSouSoo 

• the formula ips is the conjunction 
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• the formula ^4 is the conjunction 

A ^a = ^zs 

In order to show that i^(Parikh(Proj(ti;))) holds, for each S C. Q, we pick the 
following integers as witnesses for zs- 

• zs = |['S']^[<Ar]|, for each S G Sfi„{w). 

• zs = 0, for each S ^ Sfi„{w). 

We need to show that all the formulae '01^4 above are satisfied. 
First, we observe that the following two points. For each a G S, 

1- #0(^1) is precisely the number of a-positions in w[< N] whose data 
value is from the set 

U [^]- 

2. #^a, S){vi) is precisely the number of a-positions in 'w[< N] whose 
data value is from the set [S]u,- Recall that in this case S G iSoo- 

Then, ipi follows immediately from (1) that such #a(^i) number of a- 
positions must be greater than the number of its data values X]se5fi„ 
The formulae ^2 and ■03 follows immediately from the definition. That the 
formula ^4 holds is because of (1) and that the number #a(^i) of such 
a-positions is precisely the number of its data value ^seSn„ \ [^]w\- 

B.1.2 Proof of V2 = xn+ixn+2 ■ ■ ■ ^ ^(•^) 

~ p 

Recall that the Biichi automaton A is the intersection of Mp^ with the 
automaton that checks the following condition. 

1. Each {a, S) G S(iSoo) appears infinitely many times. 

2. If the key-constraint V{a) 1-^ a £ C, then the symbol a does not appear. 

Now, to show that V2 G C{A), we claim that pn+iPn+2 ■ • • is also an ac- 
cepting run of A on V2- 

First, we show that xn+iXn+2 ■ ■ • satisfies the properties (1) and (2) 
above. As 5 G Sco{w), then it means each data values in [S]yj appears 
infinitely many often in w. By our construction of xn^iXn^2 ■ ■ ■ j it means 
each symbol (a, S) G S((Soo) appears infinitely many often in xn^iXn+2 • • • • 
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Furthermore, recall that N is an index such that [5']^[<7v] = [S]w, for each 
S € Sf,„{w). Now, if ^ ^i^) CLj then every a-position greater than in 
w has data value from the set Usg5cx)(«')['^]'^' "^^^^ means that by our con- 
struction of xn+iXn+2 ■ ■ ■ , the symbol a does not appear in X]\f^iXN+2 • • • • 

To show that xn^iXj\[^2 • • • is accepted by ^ = Mp^, we observe that 
Pn+iPn+2 ■ ■ ■ is an accepting run of A on xn+iXn+2 ■ ■ ■ , which is immediate 
by our construction of A4. 

B.2 Proof of Claim [2] 

Suppose there are the following items: 

• So, 5fin, Soo is a partition of 2^ — {0}; 

• S = SUS(5oo) and A4 = {Qifj) be the constructed new alphabet and 
transition system; 

• a state q & Q, 

such that the constructed the Presbruger automaton {Af,„ , (p) and the Biichi 
automaton A are not empty. Consider the following two words. 

• vi = bi ■ ■ ■ bN & C{Af[„,(p), where pi - • • pn be an accepting run of Af,„ 
on vi. 

• f 2 = b]\f^ib]\f^2 • • • S ^{A), where pn+iPn+2 • • • be an accepting run 
of ^ on ti2. 

We will construct an data w-word w G (S x D)^ 

\di) \d2j ydNj \dN+iJ 

which is accepted by {A,C). 

We start by defining Proj(w) = aia2 ■ ■ ■ ■ For each i = 1,2, . . 

* \ c a bi = (c, S) G Il(5oo) for some S 

By the construction of Af,„ and A, it is immediate that piP2 • • • pn+iPn+2 • • • 
is an accepting run of A on Proj(w). 

Now we will define the data values di,d2, ■ ■ ■■ For each S G Soo, we fix an 
infinite set of data values for Tis, such that all those sets S^'s are disjoint. 
We will use Ds for [S]w for each S G 5oo- 
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For each S € Sf,„, the set [5']p(^) can be computed as fohows. By the 
assumption, vi is a word such that (/?(Parikh(t;i)) holds, where ms is a witness 
for the variable zs- Let K = ^^ms- Define a function 

^ :{!,..., K} ^2^ -m, 

such that |.^~-'^(S')| = ms- We wih use S,^^{S) as [5']p(^„) = Pp{w)[<N]- 
The assignment of data values to w can be done as follows. 

1. We first define the data values for di, . . . ,d]\f. For each a € S, pick the 
positions Z{a) = {i \ h = a}. (Note that the parameter in defining 
the set Z{a) of positions is the word bi - ■ ■ b]\f.) Then we can assign 
those positions in Z{a) with the data values from UaGS^~^('^)' Such 
assignment is possible as \Z{a) \ = ^^^{a) > J2aes''^s- 

2. Then, we define the data values d^+i, dN+2-, ■ ■ where hi G S. This is 
easy. We just pick some arbitrary data values from Uge5^~^('^)' 

3. At this stage we have define all the data values dj's for the positions 
i labeled with symbols from S in the viV2- What is left is to de- 
fine the data values for the positions in viV2 whose labels are from 
E(5cxd). Here we will use the data values in and the assignment is 
done inductively. For each data value d in that has not appeared 
yet in w, we pick |5| number of positions h,. . . ,l\s\ in w such that 
{a;^, . . . = -S* and have no data values yet. Then, we assign all 
those positions with the data value d. By the acceptance criteria of 
the Biichi automaton A, there are infinitely many such positions for 
each S G 5oo- Thus, such assignment is always possible. 

What remains now is to prove that w \= C. 

By Proposition m and the construction of Sq, as well as the Presburger 
formula (p, it is immediate that w satisfies the inclusion- and denial-constraints 
in C. We will show that it also satisfies the key-constraints. 

Suppose the key-constraint V{a) 1-^ a £ C. First, in the assignment of 
data values in w[< N] all the a-positions recieve different data values, due to 
the constraint |.Z'(a)j = #t,i(a) = J2aeS^s- Second, from the construction 
of the automaton A, the symbol a does not appear in b]\f^ib]y^2 ■ ■ ■ , thus 
not appearing in w[> N + 1]. This means that we do not assign any data 
values from UaG5 in every a-positions > A'' + 1, so all data values in 

Uae5'?~^('^) fippears only in once in a-positions. Lastly, all the data values 
in [5']p(^) for each S € Soo are assigned only once. Thus, it follows that 
every a-positions in w have different data values, thus, w \= V{a) i-)- a. This 
completes the proof of Claim [2j 
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C The NP algorithm for Theorem [3] 

We identify that in our algorithm in Subsection l3.2l the exponential blow-up 
occurs in Step (1), where we have to enumerate all the non-empty subsets of 
S. Especially, the size of the set iSoo determines the sizes of the new alphabet 
S, the transition system A4. And the size of the set Sr,„ determines the size 
of the Presburger formula ip. 

The main idea of our NP is that if there is no key-constraint in C, then 
the following holds. There exists a subset ^ C 2^ of polynomial size such 
that there exists an data w-word w G C{A,C) if and only if there exists an 
data Lj-word w' G C{A,C), where [S]^' = 0, for all S ^ Z. This means that 
in the constructions of S, A4, and if, we only need to take into account the 
sets in Z. This idea is the one that we are going to explain in the next 
subsection. 

C.l Preliminary notion 

Let C be a collection of inclusion- and denial-constraints. We define the 
subset So{C) C 2^ as follows. 

1. If C contains the inclusion-constraint V{a) C (Jbe-R^(^)' then S € 
So{C) for all 5 C S where a G 5 and S n i? = 0. 

2. If C contains the denial-constraint V{a) fl V{b) = 0, then S G So{C) 
for all 5 C E where S contains both a and b. 

Remark 10 Given a non-empty set S* C S, we can decide in polynomial 
time whether S G So{C) 

C.2 The algorithm 

Given an ADC {A,C), where C does not contain key-constraints, the algo- 
rithm works as follows. 

1. Construct (non-deterministically) a function / : S i— t- 2^ such that for 
each a G S, either 

a G /(a) and /(a) ^ cSo(C) 

or 

/(a) = 
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Such function can be non-deterministically constructed, by guessing 
/(a) for each a G S and verify (in polynomial time) deterministically 
that /(a) ^ So{C). 

2. Divide (non-deterministically) lmage(/) into two categories: Sr,„ and 

The intended meaning of Sf,„ and iSoo is the same as the algorithm in 
Subsection 13.21 Every other subsets not in Sfi„ U 5oo are considered in 
So. 

3. Define the alphabet S = S U T,{Soo), where 

^(•Soo) = {(«> S) \ a G S and S G Soo}, 
and a transition system A4 = {Q,fL) over the alphabet S, as follows. 
Q = Q 

A = /LtU {(p, (a,5),g) I (p,a,g) G /i and (a,5) G S(5oo)} 

4. Non-deterministically choose one state q (z Q. 

5. Construct a Presburger automaton {Af,„,(p), where Af,„ = Mg^^ and 
the formula is as follows. 

Let lmage(/) = {^i, . . . , Si}. Then 93 is of the form 3zsi ' ' ' ^^St V'l 
where ^ is the following quantifier-free formula: 

/\lxa> Yl 1 A /\ Z5 = A /\ Zs>l 

aeS \ SBa and5Glmage(/) / SGSoo S£Sf,„ 

. ~ F 

6. Construct a Biichi automaton ^is simply the intersection of with 
the automaton that checks that each (a, S) G S(5oo) appears infinitely 
many times. 

7. Test the emptiness of C{Af,„,ip) and C{A). 

Then, /:(^,C) / if and only if C{X.,^) + and L(A) + 0. 

C.3 The proof of correctness 

In view of Claims [T] and [21 to prove the correctness of our algorithm, it is 
sufficient to prove the following. 
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Claim 3 // an data co-word w \= C, then there exist a function / : S i— )• 2^ 
that respects the condition in Step (1) and an data u-word v \= C such that 
Proj{v) = Proj{w) and [S]^ = 0, for all S ^ lmage{f). 



Proof. Let 



w = 




) 



We define the function / as follows. For each a G S, 

• if the label a does not appear in w, then /(a) = 0; 

• otherwise, define /(a) = Sa such that a £ Sa and [Sa]w 7^ 0. 

Such a set Sa exists as there is at least one a-position in w and this 
position has a data value in Vw{a), which is partitioned into UogsI'^]'"- 

We define the data word v as follows. 



Thus, Proj(t;) = Proi{w). We define the data values d'^^d^, ... as follows. 

• If € [S'lto, for some S G lmage(/), then d[ = di. 

• li di ^ [S]w, for all nonempty S G lmage(/), then we pick arbitrary 
data value from [f{ai)\w to assign to d'^. 

By such construction, we have [S\v = [S]w, for all non-empty S G lmage(/). 
By Proposition m w |= C. Furthermore, [S]^ = 0, for all S ^ lmage(/). This 
completes the proof of our claim. □ 

D Proof of Theorem [5] 

For the sake of presentation, we first show the decidability of a simpler ver- 
sion of the problem Omega-SAT-zonal- AUTOMATA, which we call Omega- 
SAT-LOGALLY-DIFFERENT in Subsection ID. 11 Then, in Subsection ID. 2 1 we 
explain how to adapt the approach in Subsection ID. II for Omega-SAT- 
zonal- automata. 
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D.l Locally different data w- words 

A data word w = (^j) • • • is called locally different, if each position has 
different data value from its left- and right-neighbors, that is, di 7^ di+i, for 
each i = 1,2, . . .. 

In this section we give an algorithm to decide the problem SAT-locally- 
DIFFERENT defined below. 

Problem: Omega-SAT-locally-different 
Input: a Biichi automaton A and 

a collection C of key-, inclusion- and denial-constraints 
Question: is there a locally different data word w such that 

Proj(it;) G ^{A) with an accepting run p and p{w) \= CI 

In the proof we will use the following simple lemma. 

Lemma 11 [8] Lemma 3] Let v he a finite data word over S. Suppose that 
for each a € S, either Vu^a) = ^ or \V.o{a)\ > [S| + 3. Then we can rearrange 
the positions of the data values in v such that the resulting data word w is 
locally different, Proj{w) = Proj{v) and for each a € S, Vw{a) = K,(a). 

What this lemma tells us is that when the number of data values in found 
in a-positions is big enough, for each a S S, then to solve SAT-locally- 
different, it is sufficient to solve Omega-SAT-ADC. Then, Lemma [TT] 
allows us to rearrange the data values in the solution of Omega-SAT- ADC 
to be locally different. 

In the rest of this section, the symbol e denotes the constant |S| + 3. The 
main idea follows roughly as the one in the previous section, with the notable 
exception that for an data w-word w, we divide the non-empty subsets S C E 
into four categories: 

. So{w) = {S I [su = 0}. 

• SfJ^{w) = {S I [S]w is a finite set of cardinality < e}. 

• S^„'^{w) = {S I [S]w is a finite set of cardinality > e}. 

. SooH = {s I [S] ^ is an infinite set}. 

Note that in an data w-word w, for a G 5 and S G S^J^{w), then Vw{a) > 
e. This will allow us to apply Lemma [TTl for V^(a), where a G and 
S G S^{w). On the other hand, the data values in the sets [S]^, where 
S G Sff{uj) can be regarded as fixed constants, thus, can be embedded as 



27 



part of the input alphabet. This is our main idea to solve SAT-locally- 

DIFFERENT. 

The details QlTQ clS follows. Given an input {A,C), our algorithm does 
the following. 

1. Guess a partition So,Sff ,S^„^ ,Soo of the sets 2^ — {0} as in the algo- 
rithm in Subsection 13.21 

That is, it respects the following conditions. 

CI. If the inclusion-constraint V{a) C \Jf,^jiV{b) is in C, then all the 
sets S, where a £ S and 5 PI = 0, are in Sq. 

C2. If the denial-constraint V{a) D V{b) = is in C, then all the sets 
S, which contains both a and b, are in Sq. 

2. Then, for each S G SfJ^, we further guess a non-zero constant Ks < £ 
and fix a set Ts of Ks number of constants. Define 



The intention is that we only need to consider the data w- words w in 
which [S]w = ^s, for each S G Sff. 

3. Let S(5oo) = {{a,S) | a G 5 and S G 5oo}- Construct the new 
alphabet S, where 



S(5</) = {(a, d)\a£ S and deVs where S G 




S U 5](5</) U S(5oo) 



and the new transition system Ai = {Q, fl) is defined as: 



Q 



Q 



{(P, {a,S),q) I {p,a,q) G fi and {a, S) G 5(oo)} 
U 

{{p,{a,d),q) I {p,a,q) G /x and {a,d) G 



4, 



Non-deterministically choose one state q £ Q. 
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Construct a Presburger automaton {A,,„,if) as follows. 

(a) The automaton Aa^ is AI^q"'^ intersect with an automaton that 
checks the property: 



• If two symbols (a, di), {b, ^2) G ^i'^nf) appear in two consec- 
utive positions, then di / c?2- 
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• If the key-constraint V{a) i-)- a G C, then the symbol (a, d) € 

(b) The Presburger formula is defined as follows. Let Si, ... , Sm be 
the enumeration of non-empty subsets of Q, where m = 2^^^ — 1. 
The formula 99 is of the form Bz^^ ■ ■ ■ Bz^^ ■0, where ip is the 
following quantifier-free formula: 

aGS 59a 

A [\ zs = Q ^ l\ zs>i 
5G-Sou<s<;u<Soo 5e5|," 

6. Construct a Biichi automaton ^ as follows. 

The Biichi automaton A is simply the intersection of Ai^ with the 
automaton that checks the following condition. 

(a) If two symbols (a, di), (6, (^2) G ^('^Jl;'^) appear in two consecutive 
positions, then di ^ d2- 

(b) Each (a, S) G S(5oo) appears infinitely many times. 

(c) If the key-constraint V{a) a C, then the symbols a and 
(a, (i) € S X Ts, for some 5 G Sff do not appear. 

7. Test the emptiness of C{Af,„, (p) and C{A). 

Then, /:(^,C) / if and only if C{A.,v) + and L{J^ ^ 0. 

The sizes of the automaton At-,„, the formula 99 and the Biichi automaton 
A are all exponential in the size of {A,C), thus, establishing the NEXPTime 
upper bound for SAT-LOCALLY-DIFFERENT. The proof of correctness is 
similar to the proofs of the Claims [5] and [TJ Lemma [TT] ensures us that we 
get a locally different data w-words. The constant data values from Yi{S'^^) 
are already ensured by the automata Ai\„ and A that each of them does not 
appear in two consecutive positions. 

D.2 The algorithm for Omega-SAT-zonal- automata 

Now we explain how the algorithm for Omega-SAT-locally-different 
can be adapted for Omega-SAT-zonal-automata. It works as follows. 
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Given a zonal automaton A and a collection C of data-constraints over 
the alphabet 2^, the algorithm does the following. It guesses if there exists 
a zonal word with infinitely many zones. If there is one, then the algorithm 
for Omega-SAT-locally-different can be adapted in a straightforward 
manner. Otherwise, it does the following. Let A4 = {Q, jj) be the transition 
system of A, where A = M^^. 

1. Guess a state q G Q. 

2. The presburger automaton {Af,„,ip) is Af,„ = over the alphabet 
S U 2^ and the formula ip can be constructed like in Step (5) of the 
algorithm in Subsection ID. H but over the alphabet 2^. 

^ p 

3. The Biichi automaton A is simply A4g intersects with an automaton 
that checks that the symbols from 2^ does not appear. 

The intuition is that since there are only finitely many number zones, all the 
zones and its data-constraints are taken care by the Presburger automaton 
{Afin , (p) . The Biichi automaton A simply makes sure that the last zone has 
the property desired by the original Biichi automaton A. 

E The formal semantics of LTL[0^, O', X^, X^] 

Formally the semantics of LTL[0"', 0*,X^,X^] is given as follows. Let w = 
{Z){Z)--- and ^G {1,2,...}. 

• w,i \= True and w,i ^ False; 

• w,i \= a if and only if Oj = a; 

• w,i \= if y Tp if and only if w,i \= ip or w,i \= ip; 

• w,i \= ^ip if and only if w,i \= p is not true; 

• w,i \= Xy? if and only if i + 1 < n and w,i + 1 \= <p; 

• w,i \= X^(/? if and only if i + 1 < n and di = dj+i and w,i + 1 \= f, 

• w,i \= X^p if and only if i + 1 < n and di / dj+i and w, i + 1 \= (p; 

• w,i \= p^V^ if and only if there exists j > i such that for all i' = 

-1, w,i' \= ip and w,j \= tp; 
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• w,i \= (pRip if and only if if there exists j > i such that w,j ^ ip, then 
there exists i' G {i, . . . , j — 1}, w, i' \= ip; 

• w,i \= O'^ip if and only if there exists j such that dj = di and j \= 

• w,i \= O'*^? if and only if there exists j ^ i such that dj = di and 

A LTL[0*", O^, X^, X^] formula ip defines a data language via L[ip) = {w \ 
1= ip]. 

F Proofs of the upper bounds in Theorem [9] 

We first establish a normal form for formula in LTL[0'^, O'^, X^, X^]. A 
formula ip is in normal form, if every subformula in ip that starts with a 
negation, say -^ip, then ^ is either a E S, or O^ip', or O"'^', for some ip' . 

Proposition 12 Every formula p in LTiy[0"', O^, X^, J^] can be converted 
to its equivalent normal form ip in linear time. 

Proof. The construction ip is done inductively. 

• If 99 does not start with a negation, then tp is precisely ip. 

• If 99 is in the form then ip is ip. 

• If 99 is in the form V ip'), then ip is ^ip A ^ip'. 

• If 99 is in the form -i(^/> A ^z^'), then ip is -1^ V ^ip'. 

• If 99 is in the form -^Xip, then ^ is X-i^. 

• If 99 is in the form ^{ipVip'), then ip is ^ipR^ip'. 

• If 99 is in the form ^{ipRip'), then ^ is ^ipV^ip'. 

• If 99 is in the form ^(X^ip), then ^ is (X^True) V Xr-^^ip. 

• If 99 is in the form ^(X^ip), then tp is (X^True) V X^-i-;/'. 

• If 99 is in the form ^<>^ip, then tp is ^<>^ip. 

• If 99 is in the form -^O^tp, then ^ is ^<>^ip. 
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That and 99 are equivalent is straightforward. 



□ 



Remark 13 It is straightforward from the construction of 99, that (p stay 
in the same class as ip. That is, 

• if G LTL[0'"], then p G LTL[0"']; 

• if 99 e LTL[0*], then p G LTL[0"]; and 

• if 99 G LTL[0"',O^X^,X^], then ^ G LTL[0"', 0^ X^, X^]. 

F.l The NEXPTime upper bound for part (1) of Theorem [9] 

By Proposition ll2l we can assume that the input formula is always in normal 
form. The proof of decidability itself is done by translating the input formula 
(p G LTL[0"'] to an equivalent ADC {A^p,C^p). The translation follows closely 
the classical translation from standard LTL to Biichi automaton. (See, for 
example, [11].) So we simply sketch it here. We recall the standard notion 
of the closure of the formula (/9, denoted by CI((/9). 

• G 0{<p). 

• a G €1(99), for each a G S. 

• If G Cl(99), then VfeeS-{a} ^• 

• If (/9i A (/?2 G CI((/9), then <pi,'P2 G Cl((/9). 

• If 991 V 932 £ CI((/9), then 991, (^2 £ CI (99). 

• If X 991 G Cl(97), then 9^1 G €1(99). 

• If 991 U 9^2 G CI (97), then 9^1, 992 G €1(95). 

• If 991 R 9^2 G 01(99), then 991, 992 G 0{lp). 

• If 0"'99i G Cl((^), then (^1 G Cl((^). 

• If -O'^c^i G Cl(99), then (^1 G Q\{ip). 

The standard construction of A^p = {Q, qo,Hi F) will yield Q C 2'-'^'^), where 
g G Q if the conditions hold. 

(SI) False ^g; 
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(52) g n S is a singleton; 

(53) if ifi € g, then the normal form -1991 ^ q] 

(54) if the normal form € q, then (^1 ^ 

(55) if A (^2 G g, then (^1, 992 G g; 

(56) if 921 V 932 £ 9; then ipi & q 01 & q- 

Intuitively, the meaning of A,p is such that in every state q, it takes care 
that every formula q — {O^ip \ tp G €1(99)} holds. The construction of go, 
and F are standard like in [21j, thus, omitted. The set C of constraints will 
take care of the operator O"^. It consists of the following. 

1. For every state q that contains the sub-formula O'^tp^ then contains 
the constraints: 

V{q) C V V{q'). 

2. For every state q that contains the sub-formula ^O^tp, then con- 
tains the constraints: 

V{q) n V{q') = 0, 

for all q' that contains ^p. 

Now C does not contain key-constraints. The construction of A^p is already 
in EXPTime. By NP upper bound in Theorem [3l we get the NEXPTime 
upper bound for the satisfiability problem of LTL[0'^]. 

F.2 The 2-NEXPTime upper bound for part (2) of Theorem[9] 

By Proposition 1121 we assume that the input formula ip is in normal form. 
Again, the proof of decidability is done by translating the input formula 
ip G LTL[0^] to an equivalent ADC {Aip,Cip). However, we have to make 
a bit of modification because, as explained in Example [21 formulas such as 
G(a — > O^a) cannot be directly translated to an ADC. 

We apply the same trick as in Example [2] to make a copy tp of each 
formula ip in C\{ip). The idea is that the copy tp has exactly the same 
property as the formula ip. We denote by CI (97) the set of all such copies. 

Then the ADC {Aip,Cip) is defined over the alphabet S U E as follows. 
The automaton A^ = {Q,qo,lJ-,F) is such that Q C 2'^'(^)uci{¥'). A state 
5 € Q if in addition to Conditions (S1)-(S6) above, the following conditions 
hold. 
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• both False and False are not in q; 

• (7 n (S U S) is a singleton; 

• if G q, then both -i</?i and =^ are not in q; 

• if -1(^1 G q, then both ifi and ^ are not in q; 

• if (^1 A (^2 S q, then either 

- V'i>9^2 G g, or 

- W> 9^2 G or 

- <^i,^ G g, or 

- G 9; 

• if A (/J2 G Q', then either 

- <Pi,V>2 G or 

- W> 9^2 G q, or 

- (/'i,^ G g, or 

- G q; 

• if (^1 V (^2 G then one of (pi, (^2, G 

• if V (p2 G then one of ¥'i,<^2,W5^ G g; 

• if ^ G g, then ipi ^ q; 

• if 991 G q, then Tpi e q; 

• if O^yji G q, then either 0*</? G q, or G 

• if V'l) then O^'c^i ^ q; 

• if then O''^ ^ 

Note that in such construction the states that are supposed to contain both 
ipi and O^ipi are replaced by states that contain either 

• both ipi and O*^, or 

• both Tpi and O'^tp. 

The construction of qo, ji and F is standard. 

The collection of data-constraints consists of the following. 
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1. For each state q that contains both the sub-formulae tp and -■O^-i/') 
contains: 

• the key-constraints V{q) i— )• g; and 

• the denial-constraints V{q)r\V{p) = 0, for all state p that contains 
^. 

The same if q contains both -0 and -^<>^ij) 

2. For each state q that contains the sub-formula O^ij) but not the sub- 
formula ij), Ctp contains the inclusion-constraints 

V{q) C \l V{p). 

3. For each state q that contains the sub-formula ^O^ip but not the sub- 
formula ip, Cip contains the denial-constraints 

V{q) n V{p) = 0, 

for every state p that contains ip. 

The construction of A,p is already in NEXPTime. By Theorem [3l we get 
the 2-NEXPTime upper bound for LTL[0^]. 

F.3 The 3-NEXPTime upper bound for part (3) of Theoremd 

If we have the local comparison and X^, it can be handled with the 
addition of profile in the automata. As the inclusion of profile constraints 
induce an exponential blow-up, we get 3-NEXPTime upper bound. The 
construction is straightforward, thus, omitted. 

G The NEXPTime-hardness of LTL[0] 

In the proof of the following theorem we will use O^ip as an abbreviation of 

Theorem 14 The satisfiability problem for LTL[0''^]on (finite and infinite) 
data words is NEXPTime-hard. 
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Proof. The proof is by reduction from the 2^-corridor tihng problem. An 
instance I = (T, H, V, F, L, n) of this problem consists of a finite set T of 
tile types, horizontal and vertical constraints H^V C T x T, constraints 
F,LCT for the first and last row and a number n given in unary. The task 
is to decide, whether T tiles the 2" x 2^-corridor, respecting the constraints. 
This problem is NEXPTime-hard [6]. 

For an arbitrary instance / = (T, H, V,F,L, n) of the 2"-corridor tiling 
problem we will construct a formula ^pi of polynomial length (in \I\) which 
is satisfiable if and only if I has a solution. 

We use S = T U {0, 1} U {c, r, u} as the underlying alphabet. The idea 
is to assign to every square on the tiling grid a column and a row number 
to be able to check the constraints. We use data values as pointers to the 
binary encoding of a number. We first introduce some abbreviations. A 
bit is represented by two successive positions in the data word. The first 
one is labelled by or 1 and the data value of the second position serves as 
a pointer to the position with the next bit. It is crucial that all positions 
pointed by the same pointer carry the same bit value. The following formula 
ensures that this property holds. Since we will encode binary numbers with 
n bits, the X-operator is used n — 1 times. 

^b^tstr^ng =O"'((0 V 1) A XO"'((0 V 1) A XO'"(. . . A XO"'(0 V 1)) . . .))A 

(□'"0 V □■'"1) A □'"x((n'"0 V A □'"x((n'"0 V A ... A m(n^o V . . .)) 

The next formula encodes the number in binary. 

= □"'(0 A xn"'(0 A xn-'"(. . . A XD^'O) . . .)) 

The following encodes number 2" — 1. 

= A xn^(l A xn'"(. . . A xn"'l) . . .)) 

The next formula says that the ith bit encodes bit value b. The expression 
(xn"')*~^ means that XD'" is repeated i — 1 times. 

bit-i-b = a'^{ia'"y-^b 

for 1 < i < n and b G {0, 1}. 



It should be noted that the first bit serves as the lowest bit. 

The formula ipj is composed of the formulas ip and x- the formula ip 
describes the encoding of the tiling grid and x describes the constraints 
which has to hold. 
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Every square of the tiling is represented by a sequence of four positions 
in the data word. The first position is labeled by the tile type belonging 
to this square, the second one serves as a pointer to the bit representation 
of the column number of the square, the third one serves as a pointer to 
the bit representation of the row number of the square and the fourth one 
serves as an up-pointer to the next upper square on the same column. Such 
a sequence of positions will be called square encoding. 

l/jl = G[\/ t ^ X{c A ipbitstring)] 

teT 

i^2 = G[\l XX(r A ^Utstring)] 

teT 

^^ = G[\J XXX(u A D'" \/ t)] 
teT teT 

The first 4 • 2^" positions of the word represent a list of all square encodings 
of the tiling. The list begins with the square with column number and 
row number 0. After all 2" squares of a row are listed the first square of the 
next row follows. 

First we have to ensure that the first square encoding has row number 
and column number 0. 

■i/'4 = X(^o A XX(^o 

A square encoding with column number z < 2"^ — 1 and row number j is 
followed by a square encoding with column number i + 1 and row number j. 

n 

^5 =G[{ \J t A X^ipi) (/\ {{XXbit-i-0 XXXX( \/ t A XXbit-i-0) 
teT i=i teT 

A {XXbit-i-1 XXXX( Y t A XXbit-i- 1)))] 

teT 

n i—1 

=G[(\/ t A X^ipi) y {/\(Xbit-j-l A XXXXXbit-j-0)A 
teT i=l 3=1 

Xbit-i-0 A XXXXXbit-i-lA 

n 

/\ {{Xbit-j-0 XXXXXbit-j-0) A {Xbit-j-1 XXXXXbit-j-1)))] 
j=i+i 
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A square encoding with column number 2" — 1 and row number j < 2" — 1 
is followed by a square encoding with column number and row number 

^7 =G[{\J t A X(^i A XX-.(^i) -> (XXXX(\/ t A X(^o))] 

n j— 1 

^8 =G[( V t A X(^i A XX-.(^i) -)■ \/{/\{XXbit-j-l A XIXXXXbit-j-0)A 

t£T i=l j=l 

XXbit-i-0 A XXXXXX6ii-i-i A 

n 

/\ {{XXbit-j-0 -)■ XXXXXXbit-j-0) A {XXbit-j-1 -)■ XXXXXX6«t-j-i)))] 

After the square encoding with column number 2^* — 1 and row number 2" — 1 

there follow no more positions labelled with a tile type. By this we ensure 
that every square encoding occurs exactly once. 

^9 = G[( Y i A Xipi A XXipi) -^XG /\ -^t] 
teT teT 

The up-pointer of every square encoding with column number i and row 
number j < 2" — 1 points to the first position of the unique square encoding 
with column number i and row number j + 1- 

n 

=G[(\/ t AXX-.(^i) {/\{{Xbit-i-0 XXXD"" {\J t AXbit-i-0)A 

teT i=i teT 

{Xbit-i-1 XXXn'^iy t A Xbit-i-1)))] 
teT 

n i— 1 

V'li =G[{\/ tAXX^cpi) y {/\{XXbit-j-l AXXXn'"XXbit-j-0)A 

teT i=i j=i 

XXbit-i-0 A XXXD'^XXbit-i-lA 

n 

/\ {{XXbit-j-0 XXXD'"XXbit-j-0) A {XXbit-j-1 XXXa'"XXbit-j-l)))] 

j=i+i 

The following formulas express that the constraints in / are respected. 
The squares of the first row carry only tile types from F. 

xi = G[{\/ tAxxifo)^ yt] 

teT teF 
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Similarly, the squares of the last row carry only tile types from L. 

X2 = G[i\/tAXX^,)^\/t] 

teT teL 

The tile type of a square and the tile type of his right neighbor respect 
the horizontal constraints. 

X3 = ((t A X-^i) ^ XXXX \J t')] 

The tile type of a square and the tile type of his upper neighbor respect the 
vertical constraints. 

X4 = G[/\((tAXX-(^i)^xxxn"' V t')] 

t£T {t,t')&V 

The desired formula is (pi = Ai^li V'i ^ Aj=i Xj- It's easy to see that (pi 
is satisfiable if and only if / has a solution. □ 
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